1 Introduction

According to the UNHCR, refugees are those who have been forced to leave their country due to violence, war, or persecution based on their race, religion, nationality, political opinion or particular social group. Furthermore, there are currently 25.9 million refugees in the world, indicating the dramatic growth in refugees over the past decade. This led us to question what the refugee resettlement trend has been for the past decade, and delve deeper than just the changes in the numbers of refugees.

The United States is one of the nations that have one of the largest resettlement of refugees. However, in 2017, the US resettled the fewest number of refugees compared to the rest of the world.We are interested in answering the following questions to gain a better understanding of the refugee resettlements in the United States:

  1. What insights can we gain from temporal exploratory data analysis of refugee settlement patterns in the US? Have there been increases/decreases in refugee settlements from 2009 - 2018?
  2. What insights can we gain from geographical visualization of refugee settlement patterns in the US? Why might some states have larger refugee settlements than others?
  3. What changes in demographic patterns (i.e., religion, gender, age, etc.) within the refugee population can we visualize? Can we observe any relationship between certain demographics and refugee settlements?

2 Data Sources

We first collected data from RPC (Refugee Processing Center), that provides refugee arrival information by state and nationality, by destination and nationality, by nationality and religion, and by demographic profile.

We can select the time frame, nationality. Since the RPC website does not allow for faceting by year, we had to download the files year by year, and clean the data into the format we want for data analysis.

3 Data Transformation

3.1 Cleaning the Data in R

  1. From the website, we were able to download ‘.xlsx’ files. Raw files can be found here.
  2. Wrote two functions to clean Excel sheet for a given year.
    1. clean_arrival to clean the Excel files for all refugee resettlements for each state.
    2. clean_demographics to clean the Excel files for demographic information for refugees from specific countries (namely Bhutan, Burma, DRC, Iraq, and Somalia).
  3. Wrote another function, combine_files, to combine each year’s Excel file into one.
  4. Saved these as csv files and uploaded to GitHub to easily access.

3.2 Cleaned Data Format

After cleaning the data, we have six ‘.csv’ files that can be found here.

  1. all_arrivals.csv: The total number of refugee resettlements to each of the 50 states in the US from 2009-2018. All raw files to make this file can be found here.
State Cases Inds Year
California 5524 11512 2009
Texas 3638 8826 2009
New York 2013 5003 2009
Arizona 1952 4543 2009
Florida 1834 4196 2009
Michigan 1602 3460 2009
  1. age_group.csv:
Age.Group Male Female Total country Year
Under 14 1559 1627 3186 Bhutan 2009
Age 14 to 20 1258 1310 2568 Bhutan 2009
Age 21 to 30 1823 1927 3750 Bhutan 2009
Age 31 to 40 1110 1124 2234 Bhutan 2009
Age 41 to 50 726 737 1463 Bhutan 2009
Age 51 to 64 583 626 1209 Bhutan 2009
  1. education.csv:
Education Male Female Total country Year
Bio Data not Complete 2657 1846 4503 Bhutan 2009
Graduate School 21 144 165 Bhutan 2009
Intermediate 509 496 1005 Bhutan 2009
Kindergarten 123 127 250 Bhutan 2009
NONE 100 42 142 Bhutan 2009
Pre-University 1 1 2 Bhutan 2009
  1. ethnicity.csv:
Ethnicity Male Female Total country Year
Lhotsampa 7373 7677 15050 Bhutan 2009
Other 12 15 27 Bhutan 2009
Lhotsampa 5842 5881 11723 Bhutan 2010
Other 3 3 6 Bhutan 2010
Lhotsampa 7314 7410 14724 Bhutan 2011
Other 4 7 11 Bhutan 2011
  1. native_language.csv:
Native.Language Male Female Total country Year
Bio Data not Complete 3 4 7 Bhutan 2009
Dzongka 0 1 1 Bhutan 2009
English 2 1 3 Bhutan 2009
Hindi 1 0 1 Bhutan 2009
Marathi 1 0 1 Bhutan 2009
Napoletano-Calabrese 0 1 1 Bhutan 2009
  1. religion.csv:
Religion Male Female Total country Year
Buddhist 748 853 1601 Bhutan 2009
Christian 534 518 1052 Bhutan 2009
Hindu 5798 5993 11791 Bhutan 2009
Kirat 305 328 633 Bhutan 2009
Buddhist 925 910 1835 Bhutan 2010
Christian 468 453 921 Bhutan 2010

4 Missing Values

The datasets from RPC (Refugee Processing Center) did not contain any missing values. However, we also noticed that our data contained a single row called “Unknown State”. Since this would not be plotted in our maps, we decided that it would be better to remove the data. Additionally, when we converted the State column to factors, there were 56. The extra 6 states are:

  • American Samoa
  • District of Columbia
  • Guam
  • Puerto Rico
  • Unknown State
  • Virgin Islands

We removed these rows since we are just curious about the fifty states.

5 Results

We take a closer look at the top 5 countries that have refugees resettled in the USA (https://data.newamericaneconomy.org/en/refugee-resettlement-us/).

The top 5 religions in the world are: Christianity, Islam, Hinduism, Buddhism, Sikhism (https://thecountriesof.com/top-5-largest-religions-in-the-world/). In Iraq, they separate the Muslim population into three categories: Muslim, Muslim Shiite, and Muslim Suni. For the purposes of this analysis, we will combine them as one.

6 Interactive Component

## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
Shiny applications not supported in static R Markdown documents

7 Conclusion